Recent Advances in the Automatic Recognition of Audio-Visual Speech

نویسندگان

Gerasimos Potamianos

Guillaume Gravier

Ashutosh Garg

چکیده

Visual speech information from the speaker’s mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audio-visual automatic speech recognition and present novel contributions in two main areas: First, the visual front end design, based on a cascade of linear image transforms of an appropriate video region-of-interest, and subsequently, audio-visual speech integration. On the latter topic, we discuss new work on feature and decision fusion combination, the modeling of audio-visual speech asynchrony, and incorporating modality reliability estimates to the bimodal recognition process. We also briefly touch upon the issue of audio-visual adaptation. We apply our algorithms to three multi-subject bimodal databases, ranging from smallto largevocabulary recognition tasks, recorded in both visually controlled and challenging environments. Our experiments demonstrate that the visual modality improves automatic speech recognition over all conditions and data considered, though less so for visually challenging environments and large vocabulary tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

‘vVISWa’ – A Multilingual Multi-Pose Audio Visual Database for Robust Human Computer Interaction

Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area of signal processing and pattern recognition. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The ...

متن کامل

Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition

Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition ...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

Automatic Speech Reading

Despite significant advances in the area of Automatic Speech Recognition, (ASR) systems still result in degraded performance when applied to real world conditions i.e. in the presence of noise. Research with human subjects has shown that visual information from the speaker’s face provides extensive benefit to speech recognition in difficult listening conditions. Audio Visual Automatic Speech Re...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Recent Advances in the Automatic Recognition of Audio-Visual Speech

نویسندگان

چکیده

منابع مشابه

‘vVISWa’ – A Multilingual Multi-Pose Audio Visual Database for Robust Human Computer Interaction

Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

Automatic Speech Reading

عنوان ژورنال:

اشتراک گذاری